A Peer-to-Peer Protocol and System Architecture for Privacy-Preserving Statistical Analysis
نویسندگان
چکیده
The insights gained by the large-scale analysis of healthrelated data can have an enormous impact in public health and medical research, but access to such personal and sensitive data poses serious privacy implications for the data provider and a heavy data security and administrative burden on the data consumer. In this paper we present an architecture that fills the gap between the statistical tools ubiquitously used in medical research on the one hand, and privacy-preserving data mining methods on the other. This architecture foresees the primitive instructions needed to re-implement the elementary statistical methods so that they only access data via a privacy-preserving protocol. The advantage is that more complex analysis and visualisation tools that are built upon these elementary methods can remain unaffected. Furthermore, we introduce RASSP, a secure summation protocol that implements the primitive instructions foreseen by the architecture. An open-source reference implementation of this architecture is provided for the R language. We use these results to argue that the tension between medical research and privacy requirements can be technically alleviated and we outline a research plan towards a system that covers further requirements on computation efficiency and on the trust that the medical researcher can place on the statistical results obtained by it.
منابع مشابه
Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks
This paper proposes a scalable, local privacy-preserving algorithm for distributed peer-to-peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions...
متن کاملLightweight Privacy-Preserving Peer-to-Peer Data Integration
Peer Data Management Systems (PDMS) are an attractive solution for managing distributed heterogeneous information. When a peer (client) requests data from another peer (server) with a different schema, translations of the query and its answer are done by a sequence of intermediate peers (translators). There are two privacy issues in this P2P data integration process: (i) answer privacy: no unau...
متن کاملP2P collaborative filtering with privacy
With the evolution of the Internet and e-commerce, collaborative filtering (CF) and privacy-preserving collaborative filtering (PPCF) have become popular. The goal in CF is to generate predictions with decent accuracy, efficiently. The main issue in PPCF, however, is achieving such a goal while preserving users’ privacy. Many implementations of CF and PPCF techniques proposed so far are central...
متن کاملPrivacy-preserving Distributed Analytics: Addressing the Privacy-Utility Tradeoff Using Homomorphic Encryption for Peer-to-Peer Analytics
Data is becoming increasingly valuable, but concerns over its security and privacy have limited its utility in analytics. Researchers and practitioners are constantly facing a privacy-utility tradeoff where addressing the former is often at the cost of the data utility and accuracy. In this paper, we draw upon mathematical properties of partially homomorphic encryption, a form of asymmetric key...
متن کاملPrivacy-Preserving Friends Troubleshooting Network
Content sharing is a popular use of peer-to-peer systems because of their inherent scalability and low cost of maintenance. In this paper, we leverage this nature of peer-topeer systems to tackle a new problem: automatic misconfiguration troubleshooting. In this setting, machine configurations from peers are shared to diagnose misconfigurations on a sick machine. The key challenges are preservi...
متن کامل